EDNC: Ensemble Deep Neural Network for COVID-19 Recognition

The automatic recognition of COVID-19 diseases is critical in the present pandemic since it relieves healthcare staff of the burden of screening for infection with COVID-19. Previous studies have proven that deep learning algorithms can be utilized to aid in the diagnosis of patients with potential COVID-19 infection. However, the accuracy of current COVID-19 recognition models is relatively low. Motivated by this fact, we propose three deep learning architectures, F-EDNC, FC-EDNC, and O-EDNC, to quickly and accurately detect COVID-19 infections from chest computed tomography (CT) images. Sixteen deep learning neural networks have been modified and trained to recognize COVID-19 patients using transfer learning and 2458 CT chest images. The proposed EDNC has then been developed using three of sixteen modified pre-trained models to improve the performance of COVID-19 recognition. The results suggested that the F-EDNC method significantly enhanced the recognition of COVID-19 infections with 97.75% accuracy, followed by FC-EDNC and O-EDNC (97.55% and 96.12%, respectively), which is superior to most of the current COVID-19 recognition models. Furthermore, a localhost web application has been built that enables users to easily upload their chest CT scans and obtain their COVID-19 results automatically. This accurate, fast, and automatic COVID-19 recognition system will relieve the stress of medical professionals for screening COVID-19 infections.


Introduction
According to the most recent World Health Organization (WHO) data on 9 December 2021, the cumulative number of confirmed cases of COVID-19 disease globally reached 267,184,623. The number of deaths reached 5,277,327 cases [1].
WHO chief scientist Soumya Swaminathan suggested humans were about 60 percent of the way to fight the Coronavirus [2]. However, unexpected obstacles can still arise, such as the sudden emergence of new variants [3]. Since the outbreak of the Coronavirus pandemic, many variants of the virus have emerged. Compared to the original virus, the Delta variant has a 108% higher chance of being admitted to the hospital, a 235% increased risk of intensive care units (ICU) admission, and a 133% increased risk of death [4]. However, partial and complete vaccination can reduce the risk of severe illness and death for all variants of concerns [5]. The number of hospitalizations, ICU admissions, and deaths decreased throughout the study as vaccinations increased [4]. However, Swaminathan stressed that some regions of the world have very high vaccination rates of 70 to 80 percent [2].
In contrast, less than 4 percent of the population is vaccinated in other regions, such as Africa [6]. The more this situation is tolerated, the more likely new variants will emerge. Swaminathan called on certain countries not to promote vaccines among those who have already been vaccinated but to focus on immunizing the unvaccinated and ensuring that everyone has equitable access to the vaccine [2,7].
It is paramount for those areas that do not have access to vaccination to diagnose coronavirus patients quickly. Both quantitative reverse transcription-polymerase chain

•
We propose EDNC (F-EDNC, FC-EDNC, and O-EDNC) ensemble deep neural network for COVID-19 recognition, which helps clinicians rapidly and accurately analyze and recognize COVID-19 lung infections from chest CT scans. • A deep neural network named CANet has been developed and built from scratch for comparative analysis with EDNC. • Our proposed F-EDNC has achieved an accuracy of 97.55%, followed by FC-EDNC (97.14%) and O-EDNC (96.32%). • A web application allows users to use F-EDNC easily.
The rest of this paper is structured as follows: Section 2 discusses materials and methods. Section 3 presents the results. Section 4 compares the results with state-of-the-art approaches. Section 5 concludes this study.

Materials and Methods
This section focuses on the methodology of developing and implementing the COVID-19 recognition model. We present deep learning methods to distinguish between chest CT scans for COVID-19 and non-COVID-19 symptoms. The flow diagram shown in Figure 1 illustrates these key phrases.

Materials and Methods
This section focuses on the methodology of developing and implementing the COVID-19 recognition model. We present deep learning methods to distinguish between chest CT scans for COVID-19 and non-COVID-19 symptoms. The flow diagram shown in Figure 1 illustrates these key phrases.

Main Dataset
The COVID-19 recognition task in this paper uses a CT scan dataset titled SARS-CoV-2, which is available at [23]. It contains 2481 CT scan images of both sexes collected from hospitals in Sao Paulo, Brazil. Of these CT scans, 1252 were COVID-19 positive, and 1229 were COVID-19 negative (not normal).
These CT scan images are in PNG format with 104 × 119 to 416 × 512 spatial resolution. We selected 1229 images from each category to make the data balanced perfectly. Figure 2a displays a CT scan of a COVID-19 patient in this dataset, the area indicated by the arrows is infected with COVID-19. In contrast, a CT scan of a non-COVID-19 patient is depicted in Figure 2b. Table 1 indicates information on CT images used in this study.

Classes
Numbers of Samples Format COVID- 19 1229 PNG Non-COVID- 19 1229 PNG The COVID-19 recognition task in this paper uses a CT scan dataset titled SARS-CoV-2, which is available at [23]. It contains 2481 CT scan images of both sexes collected from hospitals in Sao Paulo, Brazil. Of these CT scans, 1252 were COVID-19 positive, and 1229 were COVID-19 negative (not normal).
These CT scan images are in PNG format with 104 × 119 to 416 × 512 spatial resolution. We selected 1229 images from each category to make the data balanced perfectly. Figure 2a displays a CT scan of a COVID-19 patient in this dataset, the area indicated by the arrows is infected with COVID-19. In contrast, a CT scan of a non-COVID-19 patient is depicted in Figure 2b. Table 1 indicates information on CT images used in this study.

Materials and Methods
This section focuses on the methodology of developing and implementing the COVID-19 recognition model. We present deep learning methods to distinguish between chest CT scans for COVID-19 and non-COVID-19 symptoms. The flow diagram shown in Figure 1 illustrates these key phrases.  The COVID-19 recognition task in this paper uses a CT scan dataset titled SARS-CoV-2, which is available at [23]. It contains 2481 CT scan images of both sexes collected from hospitals in Sao Paulo, Brazil. Of these CT scans, 1252 were COVID-19 positive, and 1229 were COVID-19 negative (not normal).
These CT scan images are in PNG format with 104 × 119 to 416 × 512 spatial resolution. We selected 1229 images from each category to make the data balanced perfectly. Figure 2a displays a CT scan of a COVID-19 patient in this dataset, the area indicated by the arrows is infected with COVID-19. In contrast, a CT scan of a non-COVID-19 patient is depicted in Figure 2b. Table 1 indicates information on CT images used in this study.    In order to prove the generalization of the proposed deep learning models, another public CT dataset named COVIDx CT-2A [24] has been applied in this paper as shown in Figure 3. CT images numbering with PNG format have been randomly selected from COVID-19 and non-COVID-19 data. Table 2 indicates information on CT images used in this study.

Alternative Dataset
In order to prove the generalization of the proposed deep learning models, another public CT dataset named COVIDx CT-2A [24] has been applied in this paper as shown in Figure 3. CT images numbering with PNG format have been randomly selected from COVID-19 and non-COVID-19 data. Table 2 indicates information on CT images used in this study.
(a) (b)  The chest CT scan technology captures a series of sequential images from the patient's lung. The infected spots may present in some images but not in others in an image series; for example, the lung is closed at the start and end of each CT scan image series. In order to detect COVID-19 symptoms effectively, a sample of data that indicated that the interior of the lung was clearly apparent in them is needed. Thus, to choose an image from each patient's chest sequence images for training and validation purposes, only images in the middle of the CT sequence can be selected. Some previous methods of automatically selecting images inside the lung that are visible from a CT sequence have been used in [25].
If a user utilizes a CT dataset with DICOM format, a Python program of converting DICOM format to PNG can be executed as follows to feed the deep learning model: Step 1: Read the DICOM image with the dicom.read_file() function.
Step 2: Translate the rescale slope and intercept information from the DICOM image header.
Step 3: Display the image in the proper range by using window (1500) level (−600) and width information from the image header.
Step 4: Convert the DICOM image to PNG format using the cv2.convertScaleAbs() function.

Data Preprocessing
First of all, the dataset (the selected chest CT scan dataset) is randomly split into training, validation, and testing sets with 60%, 20%, and 20%, respectively. Secondly, to train the deep learning model appropriately, we rescale the pixel values of images to the range of [0, 1] from [0, 255] due to the pixel-value representation required in image processing [26], which can be described as follows:  The chest CT scan technology captures a series of sequential images from the patient's lung. The infected spots may present in some images but not in others in an image series; for example, the lung is closed at the start and end of each CT scan image series. In order to detect COVID-19 symptoms effectively, a sample of data that indicated that the interior of the lung was clearly apparent in them is needed. Thus, to choose an image from each patient's chest sequence images for training and validation purposes, only images in the middle of the CT sequence can be selected. Some previous methods of automatically selecting images inside the lung that are visible from a CT sequence have been used in [25].
If a user utilizes a CT dataset with DICOM format, a Python program of converting DICOM format to PNG can be executed as follows to feed the deep learning model: Step 1: Read the DICOM image with the dicom.read_file() function.
Step 2: Translate the rescale slope and intercept information from the DICOM image header.
Step 3: Display the image in the proper range by using window (1500) level (−600) and width information from the image header.
Step 4: Convert the DICOM image to PNG format using the cv2.convertScaleAbs() function.

Data Preprocessing
First of all, the dataset (the selected chest CT scan dataset) is randomly split into training, validation, and testing sets with 60%, 20%, and 20%, respectively. Secondly, to train the deep learning model appropriately, we rescale the pixel values of images to the range of [0, 1] from [0, 255] due to the pixel-value representation required in image processing [26], which can be described as follows: where Min and Max represent pixel values of 0 and 255, and Min new and Max new are the new pixel values of 0 and 1. This pixel-value rescaling approach is conducted in all training, validation, and testing datasets. Further, the deep learning network requires fixed-sized data; thus, the sizes of all CT images are rescaled to 224 × 224 to meet the input size requirement [27]. Moreover, a larger dataset in deep learning may yield higher classification accuracy than a smaller dataset.
However, having a large dataset is not always practical [28][29][30][31]. Thus, a data augmentation approach is used to increase the volume of data without acquiring new images [32]. To augment CT scan images used in this work, we perform geometric alterations such as picture rotation and flipping.

Transfer Learning Models
Given the shortage of CT scans of COVID-19 patients, training a Convolutional Neural Network (CNN) from scratch may be challenging. To overcome this difficulty, we use transfer learning techniques and a range of pre-trained models [33]. The primary advantage of transfer learning is that it can train data with fewer samples and less time [34]. The knowledge learned from the previously trained model can be transferred to the newly trained model [35].
These models have been pre-trained on the ImageNet dataset for classification purposes. ImageNet is a freely accessible image database that contains 14 million photos classified into 20,000 categories [38]. Due to the enormous dataset utilized to train these sixteen models, the learning weights of these models may be used to recognize pictures in the medical sector. The above sixteen models are used as the base models. Thus, the feature extraction layers (convolutional and pooling layer pairs) are frozen to keep their ImageNet-optimized weights, avoiding information loss and maximizing feature extraction capabilities for future COVID-19 tasks training. Then, the initial fully connected layers of the pre-trained models are trimmed, and the following layers are added to classify COVID-19 CT scans:

•
An average pooling layer, which produces a down-sampled feature map by averaging the values of all pixels in each batch of the feature map, and the calculating procedure is shown in Figure 4. The output size of the pooling layer is calculated as follows: where W I represents the input size of the pooling operation. F is the size of the pooling filter. S is the stride size; • A flattened layer to convert the down-sampled feature map to a one-dimensional array; • A fully connected layer with 64 filters and a Rectified Linear Unit (ReLU) activation to connect each neuron in layers before and after. ReLU helps to solve the problem of vanishing gradients. It is calculated using the equation below.
• A dropout layer with a 0.5 dropout ratio to mitigate model overfitting problems; • An output layer with Softmax activation to identify whether a CT-Scan is positive or negative for COVID-19 diagnosis [39][40][41][42]. Unlike ReLU, Softmax is frequently used for classification in the last layer of a model. It can be written as the following equation.
These newly added layers within the modified models are trained using the COVID-19 dataset. After each training epoch, the models are validated against a validation set. Since this research examines a binary classification problem (COVID-19 and non-COVID-19 classes), the number of neurons at the output layer was, thus, set to two. Figure 5 shows the modified architecture of the sixteen pre-trained models used in this research.

The Proposed EDNC Architectures
The accuracy of disease prediction is critical in the medical field since erroneous choices result in high expenses and risks to human life. The drawback of using the predictions of several deep learning classifiers independently is that they have a significant degree of variation. Because each model is designed differently and trained independently, they update their weights separately and provide inconsistent results when asked to categorize the same data. These problems may be addressed by the ensemble of individual models, which will decrease variance, and the ensemble model will be more generalizable than the individual models [43]. These newly added layers within the modified models are trained using the COVID-19 dataset. After each training epoch, the models are validated against a validation set. Since this research examines a binary classification problem (COVID-19 and non-COVID-19 classes), the number of neurons at the output layer was, thus, set to two. Figure 5 shows the modified architecture of the sixteen pre-trained models used in this research.
1 Figure 5. The architecture of the modified pre-trained model.

The Proposed EDNC Architectures
The accuracy of disease prediction is critical in the medical field since erroneous choices result in high expenses and risks to human life. The drawback of using the predictions of several deep learning classifiers independently is that they have a significant degree of variation. Because each model is designed differently and trained independently, they update their weights separately and provide inconsistent results when asked to categorize the same data. These problems may be addressed by the ensemble of individual models, which will decrease variance, and the ensemble model will be more generalizable than the individual models [43].
The ensemble deep neural network for COVID-19 recognition (EDNC) models are proposed based on the combination of three pre-trained models. Three out of sixteen best-performing pre-trained models are chosen to execute the ensemble. Similarly to the individual transfer learning model, feature extraction layers in each of the three pre-trained models were set to be untrainable, preventing the weights from being changed in new model training. The next section details the architectures of three types of combined models.

F-EDNC
The primary goal of the feature-ensemble deep neural network for COVID-19 recognition (F-EDNC) is to group more characteristics from the input to the model. Thus, this proposed feature-ensemble technique produces a dataset that combines all desired features in the same CT scan input. It comprises three pre-trained models with the highest accuracy for identifying COVID-19 images. Assume I input = {i 1 , i 2 , . . . , i n } to be the input COVID-19 image dataset, then we have the following: where w F1 , w F2 , . . . w Fn are the feature extracted by the transfer learning models from the same input i n . In this case, n is 3. Therefore, the feature ensemble from different pre-trained models can be represented as follows.
These three models were trimmed after feature extraction layers with average pooling layers added. Following that, an ensemble layer was created to merge the outputs of these three feature extraction layers to obtain more precise data with respect to feature information. Then, three layers were added to complete the feature ensemble model: a flatten layer, a fully connected layer, and an output layer. The loss function we used for our model is categorical cross-entropy, which can be represented as the following equation: where j represents the label, y are the target values, andŷ are the predicted values.
The architecture of the F-EDNC model is illustrated in Figure 6.

F-EDNC
The primary goal of the feature-ensemble deep neural network for COVID-19 recognition (F-EDNC) is to group more characteristics from the input to the model. Thus, this proposed feature-ensemble technique produces a dataset that combines all desired features in the same CT scan input. It comprises three pre-trained models with the highest accuracy for identifying COVID-19 images. Assume = , ,… , to be the input COVID-19 image dataset, then we have the following: where , , … are the feature extracted by the transfer learning models from the same input . In this case, is 3. Therefore, the feature ensemble from different pre-trained models can be represented as follows.
These three models were trimmed after feature extraction layers with average pooling layers added. Following that, an ensemble layer was created to merge the outputs of these three feature extraction layers to obtain more precise data with respect to feature information. Then, three layers were added to complete the feature ensemble model: a flatten layer, a fully connected layer, and an output layer. The loss function we used for our model is categorical cross-entropy, which can be represented as the following equation: where represents the label, are the target values, and are the predicted values. The architecture of the F-EDNC model is illustrated in Figure 6.

FC-EDNC
The fully connected-ensemble deep neural network for COVID-19 (FC-EDNC) combines the fully connected layers of three pre-trained models to create an ensemble model with 386 trainable parameters. The output of fully connected layer w FCn of the pre-trained models will be utilized as a distinct input for this proposed model. In this case, n is 3.
Therefore, the fully connected layer ensemble from different pre-trained models can be represented as follows.

876
The output of fully connected layers is concatenated to generate a more accurate probability of identifying COVID-19 CT scans. The architecture of the FC-EDNC model is indicated in Figure 7.
Therefore, the fully connected layer ensemble from different pre-trained models can be represented as follows.
The output of fully connected layers is concatenated to generate a more accurate probability of identifying COVID-19 CT scans. The architecture of the FC-EDNC model is indicated in Figure 7.

O-EDNC
The output-ensemble deep neural network for COVID-19 (O-EDNC) approach is accomplished by assemble three pre-trained models at the output layer with 14 trainable parameters. This method can be represented as follows: where , , … are the outputs of individual pre-trained models. Thus, the output layer ensemble from different pre-trained models can be represented as follows.
This method assumes that the ensembled model may learn more characteristics in this merged output to make more precise predictions. The architecture of the O-EDNC model is shown in Figure 8.

O-EDNC
The output-ensemble deep neural network for COVID-19 (O-EDNC) approach is accomplished by assemble three pre-trained models at the output layer with 14 trainable parameters. This method can be represented as follows: where w O1 , w O2 , . . . w On are the outputs of individual pre-trained models. Thus, the output layer ensemble from different pre-trained models can be represented as follows.
This method assumes that the ensembled model may learn more characteristics in this merged output to make more precise predictions. The architecture of the O-EDNC model is shown in Figure 8.

CANet: A Self-Build CNN Model for Comparative Analysis
We proposed a CNN model and built it from scratch to compare it with the pre-trained and ensemble models in terms of model performance, training time, and model complexity in the COVID-19 recognition task. This proposed CNN model, CANet, is constructed using three convolutional and max-pooling layers, as shown in Figure 8. The 2D convolutional operation can be written as follows: where I represents the input image, K is the kernel, and * represents the convolution operation. The number of filters in the first convolutional layer is set to 16 and raised to 64 and 128 in subsequent convolutional layers. The filter's size is set to 3 * 3. Each activation unit in the convolution layers has been implemented using ReLU activation. The output size of the convolutional layer is calculated using the following equation: where W I represents the input size of the convolution operation. F is the size of the convolution filter. S is the stride size, and P is the padding size. The architecture of the CANet model is shown in Figure 9.

Web Application Workflow
A web application is built based on the Flask framework, enabling users to upload their CT images easily and to obtain their COVID-19 results quickly. The architecture of the frontend and backend of COVID-19 recognition system is shown in Figure 10.
The following steps detail the functionality of the web application.
Step 1. The user visits the web application and uploads a CT image.
Step 2. The submitted picture is sent to the backend to which the proposed F-EDNC model is supplied. The image is resized to 224*224 and is converted to a NumPy array containing the pixel intensities before feeding into the model.
Step 3. The F-EDNC model has been saved in HDF5 format in the backend and is Three layers (a flatten layer, a fully connected layer, and a dropout layer) and a Softmax activation function are added to complete this CNN architecture. The model is trained and assessed in 50 epochs using the same COVID-19 dataset as the transfer learning models.

Web Application Workflow
A web application is built based on the Flask framework, enabling users to upload their CT images easily and to obtain their COVID-19 results quickly. The architecture of the frontend and backend of COVID-19 recognition system is shown in Figure 10.

Confusion Matrix
A confusion matrix is a table as shown in Figure 11 that summarizes the results of a classification problem prediction [44]. The number of right and wrong predictions is combined and classified in four distinct ways as follows: True

Classification Metrics
The following five metrics were used to evaluate the model's performance [44]: Accuracy refers to the ratio of correct to incorrect predictions.
Precision indicates the accuracy of which a model classifies a sample as positive. TP Figure 10. The architecture of the COVID-19 recognition system.
The following steps detail the functionality of the web application.
Step 1. The user visits the web application and uploads a CT image.
Step 2. The submitted picture is sent to the backend to which the proposed F-EDNC model is supplied. The image is resized to 224*224 and is converted to a NumPy array containing the pixel intensities before feeding into the model.
Step 3. The F-EDNC model has been saved in HDF5 format in the backend and is loaded by the model.load() function to process the input image.
Step 4. The output is calculated by the predict() function with the NumPy size array (2,1), which contains the two classes of probability. The highest probability class is then retrieved.
Step 5. The result is displayed at the front end.

Technology Used in Building the Localhost Web Application
Technologies such as Python, Keras, Tensorflow, NumPy, and Pandas are used in building the backend model. The Flask framework routes the web page and hosts the web server in Python. The advantage of using Flask is that it can build a web application in one single python file; moreover, it reduces the work of coding in JavaScript and jQuery. To develop a web application that recognizes CT scan images, we create two routes on the flask application: an index page route for the users to upload their image file and a predicted route to predict the saved model. Furthermore, Bootstrap was utilized as a CSS stylesheet in building a webpage. Bootstrap is a CSS framework that provides some pre-built CSS classes. It can help incorporate responsive web pages in the web application so that our web pages can work well on mobile browsers.

Confusion Matrix
A confusion matrix is a table as shown in Figure 11 that summarizes the results of a classification problem prediction [44]. The number of right and wrong predictions is combined and classified in four distinct ways as follows: classification problem prediction [44]. The number of right and wrong predictions is combined and classified in four distinct ways as follows: True Positive (TP): The prediction and the actual outputs are both positive. False Positive (FP): The prediction is positive, but the actual output is negative. True Negative (TN): Both the prediction and the real result are negative. False Negative (FN): There is a negative prediction, while the actual result is positive. Figure 11. A representation of the confusion matrix.

Classification Metrics
The following five metrics were used to evaluate the model's performance [44]: Accuracy refers to the ratio of correct to incorrect predictions.
Precision indicates the accuracy of which a model classifies a sample as positive.

Classification Metrics
The following five metrics were used to evaluate the model's performance [44]: Accuracy refers to the ratio of correct to incorrect predictions.
Precision indicates the accuracy of which a model classifies a sample as positive.
Sensitivity refers to a model's ability to recognize positive samples.
F1-Scores measures precision and recall in a balanced manner.
Specificity counts the number of negative samples that have been identified as such.

Results of Sixteen Modified Pre-Trained Models
The main dataset is randomly partitioned into training, validation, and test subsets with 60%, 20%, and 20%, respectively. The proposed models are trained using training data and validated against the validation set after each training cycle. Then, we use the testing dataset to evaluate the models and quantify the performance of the models using evaluation metrics. Table 3, the pre-trained models MobileNet, DenseNet201, and ResNet50V2 ranked top 3 on the prediction accuracy with 95.71%, 93.47%, and 93.47%, followed by ResNet101V2, ResNet152V2, MobileNetV2, and NASNet, all of which are achieved greater than 90% accuracy. Other models such as InceptionResNetV2, VGG16, Xception, InceptionV3, and VGG19 provided acceptable results with more than 80% accuracy. ResNet50 and ResNet101 achieved approximately 73% accuracy, whereas the worst outcomes were obtained by Mo-bileNetV3Small and EfficientNetB7, which provided an accuracy of 50%. Other measures such as precision, recall, and F1 score are detailed in Table 3.

Learning Curve Results
Accuracy and loss curves for sixteen pre-trained models during training and validation periods are shown in Figure 13. The graphs show that the MobileNet has the lowest loss rate of 11.17% and the highest accuracy rate of 95.51%, followed by DenseNet201 and ResNet50V2 with loss rates of 15.93% and 22.59%, respectively. It can be observed that all validation curves exhibit oscillations compared to the training curve. This is because the size of the validation dataset is relatively small compared to the training dataset for the model to learn. The plot also indicated that all validation data resulted in better accuracy and a lower loss rate than the training data, suggesting that the models learn better on the validation dataset than the training dataset. This is because a dropout of 0.5 is used in model training, which means 50% of the features are set to zero, whereas all neurons are used in the validation, which results in better validation accuracy.

Classification Results
Three ensemble strategies have been applied for recognizing COVID-19 CT images. It has been shown in Table 4 that all three EDNC models outperform individual pre-trained models in predicting COVID-19 lung infections. The accuracy, precision, specificity, and F1-score were improved by 4.08%, 6.03%, 5.72%, and 3.74%, respectively. However, pretrained MobileNet still holds the most significant sensitivity of 99.56%. Among three ensemble models, the F-EDNC model obtained the best accuracy of 97.55%, and the highest recall of 96.41%. At the same time, FC-EDNC holds the highest F1-score of 97.18%, the highest specificity score of 98.33%, and the highest precision score of 98.37%. The CANet model received an accuracy of 91.63%, outperforming most of the pre-trained models. The classification results using the alternative dataset can be found in Table 5. It shows that the F-EDNC obtained the best accuracy of 97.83% and the highest sensitivity score of 100%.

Learning Curve Results
Accuracy and loss curves for sixteen pre-trained models during training and validation periods are shown in Figure 13. The graphs show that the MobileNet has the lowest loss rate of 11.17% and the highest accuracy rate of 95.51%, followed by DenseNet201 and ResNet50V2 with loss rates of 15.93% and 22.59%, respectively. It can be observed that all validation curves exhibit oscillations compared to the training curve. This is because the size of the validation dataset is relatively small compared to the training dataset for the model to learn. The plot also indicated that all validation data resulted in better accuracy and a lower loss rate than the training data, suggesting that the models learn better on the validation dataset than the training dataset. This is because a dropout of 0.5 is used in

Confusion Matrix Results
It can be observed from the confusion matrix in Figure 14 that the numbers of misclassifications with proposed ensemble models have been significantly reduced compared to the numbers with the single pre-trained model. The proposed F-EDNC has only misclassified 12 CT scans out of 490 CT images (97.55% accuracy). The FC-EDNC model successfully classified 476 out of 490 CT images (97.14% accuracy), and O-EDNC correctly identified 472 CT images (96.32% accuracy). The CANet model misclassified 41 CT images (91.63% accuracy), which did not perform well compared to the ensemble models.

False Discovery Rate Results
False discovery rate (FDR) means the percentage of all false discoveries, for example, the percentage of false discoveries in the calculation of all discoveries. The formula of FDR is as follows.  Table 6 indicates the FDR of three pre-trained models and all ensemble models, it can be observed that F-EDNC obtained the lowest FDR with 1.22%.

False Discovery Rate Results
False discovery rate (FDR) means the percentage of all false discoveries, for example, the percentage of false discoveries in the calculation of all discoveries. The formula of FDR is as follows. Table 6 indicates the FDR of three pre-trained models and all ensemble models, it can be observed that F-EDNC obtained the lowest FDR with 1.22%. EDNC illustrates much better learning curves than individual pre-trained models, as shown in Figure 15. It can be observed that the F-EDNC model has the lowest loss rate of 3.42% and the highest accuracy rate of 98.92%, followed by O-EDNC and FC-EDNC with loss rates of 8.9% and 18.91% and accuracy rates of 96.89% and 92.18%, respectively. Furthermore, when training and validation loss decreases to a stable stage, the difference between the final training and validation values is minimal in F-EDNC and O-EDNC models, suggesting that F-EDNC and O-EDNC are good fit models. While in FC-ENDC, the gaps between the training and validation values in accuracy and lose curves are not promising.

Classification Results of Five Runs for Pre-Trained Model and EDNC Model
To provide a more accurate assessment of model performance than a single validation (hold-out), we implement the process (random dataset-splitting, training, validation, and testing) five times on main dataset, with the findings averaged to obtain a more consistent and reliable result. Table 7 and Table 8 shows the averaged test results of proposed models in five hold-out runs.  The CANet model shows small gaps between validation and training value in the accuracy and loss curve; however, there are oscillations observed in the validation curve, which indicated that the size of the validation sample is too small for CANet to learn. According to the above results, it can conclude that F-EDNC performs best in categorizing chest CT scan images.

Classification Results of Five Runs for Pre-Trained Model and EDNC Model
To provide a more accurate assessment of model performance than a single validation (hold-out), we implement the process (random dataset-splitting, training, validation, and testing) five times on main dataset, with the findings averaged to obtain a more consistent and reliable result. Tables 7 and 8 shows the averaged test results of proposed models in five hold-out runs.

Training Time and Model Size Results
It can be observed from Table 9 that MobileNet used the smallest amount of time to complete one epoch training. At the same time, EfficientNetB7 consumed 34 s, which is the most considerable amount of time to finish one epoch training. The sizes of each model weight are shown in Table 4 as well. It can be observed that MobileNetV3Small has the smallest size of 13.26 MB, whereas the proposed F-EDNC has the most considerable model size of 377.2 MB.

Model Deployment Result
We deployed our COVID-19 recognition system in a localhost web application for users to use. The F-EDNC model was chosen to be deployed because it has the highest average of accuracy, precision, sensitivity, specificity, and F1-score.
As shown in Figure 16, a simple HTML web page was created to allow users to obtain COVID-19 results. Users can upload CT scan images by clicking the "Choose File" button. Once the users hit the "Predict!" button, the image will be sent to the system backend, where the proposed model can use the input image to predict the COVID-19 condition. The results (COVID-19 or Non-COVID-19) will be displayed on the front end. (The code is available at https://github.com/rgiol/code, access date: 4 January 2022).

Discussion
Using deep learning approaches for recognizing COVID-19 disease is a hot topic that has sparked much attention recently. In this field, exciting results have been shown and continue to emerge while simultaneously utilizing various neural networks. CT scan images are among the critical dataset types used to identify COVID-19 symptoms. Numerous deep learning models have been created and effectively deployed for identifying  In this study, twenty state-of-the-art techniques were selected for comparison purposes. After reviewing their methods and findings, we believe that there are still several research gaps in COVID-19 recognition compared to our study. Some of the most critical ones are listed as follows: • The majority of studies utilized a dataset of only a few hundred COVID-19 images, which is inadequate for developing accurate and robust deep learning methods. Insufficient data may affect the performance of proposed methods.

•
In most studies, there was a data imbalance problem, with one class having more images than the other. This affects the accuracy of models.

•
Additionally, there are still some other pre-trained models that have not been utilized in COVID-19 classification.

Discussion
Using deep learning approaches for recognizing COVID-19 disease is a hot topic that has sparked much attention recently. In this field, exciting results have been shown and continue to emerge while simultaneously utilizing various neural networks. CT scan images are among the critical dataset types used to identify COVID-19 symptoms. Numerous deep learning models have been created and effectively deployed for identifying  In this study, twenty state-of-the-art techniques were selected for comparison purposes. After reviewing their methods and findings, we believe that there are still several research gaps in COVID-19 recognition compared to our study. Some of the most critical ones are listed as follows:

•
The majority of studies utilized a dataset of only a few hundred COVID-19 images, which is inadequate for developing accurate and robust deep learning methods. Insufficient data may affect the performance of proposed methods.
• In most studies, there was a data imbalance problem, with one class having more images than the other. This affects the accuracy of models. • Additionally, there are still some other pre-trained models that have not been utilized in COVID-19 classification.

•
The impact of different ensemble methods has not received adequate attention in COVID-19 research. It should be emphasized that these techniques are beneficial in both improving performances and dealing with uncertainty associated with deep learning models.

•
In none of the studies was there a webpage set up for users to upload images and to obtain COVID-19 predictions.
In contrast, our study used a perfectly balanced CT scan dataset with more than two thousand chest CT images. Sixteen pre-trained deep learning models have been investigated, including those not employed in the COVID-19 detection area. Furthermore, three ensemble models have been proposed to recognize COVID-19 CT images. The findings of each model are summarized in Table 10. The model proposed in our study outperforms most of the existing classifiers.

Conclusions
This paper applies transfer learning methodology to modify and build sixteen deep learning models for COVID-19 recognition with the help of chest CT scans. Three ensemble deep neural networks (F-EDNC, FC-EDNC, and O-EDNC) were proposed further to enhance the performance of those sixteen deep learning models with a dataset containing 2458 CT scans. CANet, a self-build CNN model, has been designed and trained on the same dataset. The performances of the proposed EDNC have been evaluated and compared to CANet and the sixteen modified pre-trained models. The results have shown that EDNC outperformed the pre-trained models and CANet in COVID-19 image classification performance.
Among the results, F-EDNC achieves an accuracy of 97.75%, a sensitivity of 97.95%, a precision of 97.55%, a specificity of 97.56%, and an F1 score of 97.75%. Additionally, the proposed F-EDNC is deployed through a web application, enabling users to easily use the COVID-19 recognition system. Despite the excellent performance of the proposed COVID-19 recognition system, this study has several limitations. Firstly, if a user conducts the process of deriving a 2D image from a 3D CT scan, the classification result may vary depending on the selection of the 2D image. Secondly, this study has not utilized other preprocessing techniques such as image enhancement. In future work, image enhancement technology may be used to determine whether there is room for the improvement of results. In this study, the proposed EDNC significantly improved COVID-19 recognition performance, indicating the possibility of a completely automated and quick diagnosis of COVID-19 using deep learning. This finding will save time and money for health-care professionals in screening COVID-19 infections.